space constraint
tion error; right: surprise. α is a hyperparameter we scanned for. Implement a new IM baseline: ICM (Pathak 2017 [23]
We thank the reviewers for the thorough feedbacks. Based on those, we have made numerous improvements. Original code is for decrete actions.) IM baseline with the random object. The plot is similar to "tool" in Figure 1 and we omit it due to space constraints. Rev. #1 suggested that the environments could be solved by classic planning methods.
Rethinking the Reverse-engineering of Trojan Triggers
Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space.Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes.Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26% with the BA (benign accuracy) remaining nearly unchanged.
On Robustness of Principal Component Regression: Author Response
We begin by thanking all reviewers for their extremely encouraging and helpful responses. We agree that the fact we do PCR on both the training and testing covariates should be more explicitly placed in the context of transductive semi-supervised learning. We have strived to interpret our major theorem results (Thm 4.2 & Thm 5.1) by: (i) providing examples of natural generating Proposition 4.2, should be tight). Their empirical results support our theoretical guarantees.